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Abstract. We investigate the special class of formulas made up of arbitrary but finite com- 
binations of addition, multiplication, and exponentiation gates. The inputs to these formulas 
are restricted to the integral unit 1. In connection with such formulas, we describe two essen- 
tially distinct families of canonical formula-encodings for integers, respectively deduced from the 
decimal encoding and the fundamental theorem of arithmetic. Our main contribution is the de- 
tailed description of two algorithms which efficiently determine the canonical formula-encodings 
associated with relatively large sets of consecutive integers. 

1. Introduction 

It is a well known fact that the binary encoding is on average optimal for representing integers. 
However if we think of a binary string as a computer program, it follows that such a program im- 
plicitly describes a circuit which evaluates to the corresponding integer. When literaly interpreted, 
the binary representation describes a sum of powers of two with the powers determined by the 
location of the bits. The recursive encoding which explicitly describes the circuit representation 
associated with the literal interpretation of the decimal strings was pioneered by Goodstein [4]. 
In the current discussion we depart slightly from conventional arithmetic circuit models [H [2] in 
the fact that we consider circuits or more specifically formulas which combine fan-in two exponen- 
tiation, multiplication, and addition gates with input restricted to the integral unit 1. While at 
first it might seem unnatural to allow exponentiation gates, we argue that exponentiation gates 
are implicit in the decimal encoding. Furthermore, exponentiation gates are critical for obtaining 
a small circuit which evaluate to the integer specified by the input binary strings. Throughout the 
discussion the arithmetic formulas will be described with symbolic expressions and for convenience 
we associate with the symbol x the recurring formula (1 + 1). Our main contribution is an asymp- 
totically optimal algorithm for finding Goodstein formula encodings for relatively large subset 
of consecutive integers. Finally we describe an alternative canonical formula-encoding conjec- 
turely smaller on average when compared with the Goodstein formula-encoding. We also provide 
an efficient algorithm for computing the latter formula-encodings for relatively large subsets of 
consecutive integers. 

2. The Set of Formula-Encodings of Positive Integers 

Let £ denote the set of symbolic expressions which result from finite combinations of addition, 
multiplication and exponentiations where the only input is 1. For instance (abbreviating, for 
convenience, x = 1 + 1) 

(1) x- (1 + (x ■ x ■ x) + x" • x)) + V (ri) + V (i"l) + 1" (x~ (z A l)) + 1 6 £. 
Elements of the set £ for our purposes will be encoded as strings from the alphabet 21 

(2) 21 :={1, +, •, -}. 

For the reader's convenience we shall adopt the infix notation thereby making use of the paren- 
thesis characters '(' and ')'. However we point out that the parenthesis characters '(' and ')' can 
be omitted from the alphabet 21 since either the postfix or prefix notations avoid their use entirely. 
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FIGURE 1 . Illustration of a formula operation. 



Of course every such expression evaluates to a positive integer, and we are interested in the 
shortest possible expression of representing any given positive integer, or at least (for large integers) 
as close as possible. The evaluation function is defined recursively as 

(3) E(a + b) = E(a) + E(b) , E(a ■ b) = E(a) ■ E(b) , E{ab) = E(a) E ^. 

One can introduce "axioms" that transform one tree to another without changing its value, but 
these are left to the reader. 

3. Canonical forms 

We shall crucially require for our purposes the notion of canonical form expressions or canonical 
form formulas. Canonical form expressions or formulas are elements of £ which we think of as 
unambiguous representatives of the corresponding integer. We will discuss here two important 
canonical forms. We point out however that our choices of canonical forms are bound to be 
somewhat arbitrary and incidentally alternative representative choices could be made. 

3.1. The First Canonical Form. An expression / S £ is in the First Canonical Form (FCF) if 
/ corresponds to a finite sum of the form (recall that x is short for 1 + 1) 

(4) / = £(*7k) or/=l + £>Vfc) 

k k 

such that the expressions fk are distinct for distinct values of the index k and each ones of the 
expressions fk <E £ being themselves in the FCF. 

Proposition 1: An arbitrary / G £ is either in the FCF or can be transformed into an ex- 
pression in the FCF via a finite sequence of transformations which preserve evaluation value. 

Proposition 2: Every expression / e £, is a member of finite set of non trivial equivalent 
expressions (i.e. expressions not including any subexpressions of the form g"l, V g or g- 1 for some 
arbitrary expression g). 

Proof : A constructive proof of proposition 1 and 2 readily follows from the quotient remainder 
theorem. 

3.2. The Second Canonical Form. An expression / G £ is in the Second Canonical Form 
(SCF) if / corresponds to a finite product of the form 

(5) / = ]J K 1 + A) ~9k] or / = (x'g) ■ J[ [(1 + A) ~ 5 fc] 

k k 
where for distinct values of the index k, the formula associated with (1 + fk) encodes distinct 
primes greater than 2. Furthermore, fk, gk, g € £ are themselves expressions in the SCF. 

Proposition 3: An arbitrary / G £ is either in the SCF or can be transformed into an ex- 
pression in the SCF via a finite sequence of transformations which preserve the evaluation value. 
Proof : A constructive proof of proposition 3 immediately follows from the fundamental theorem 
of arithmetic. 
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A considerable advantage of the SCF as a default encoding for integers is the fact that the encoding 
considerably simplifies the computational complexity analysis of formulas arithmetic. In partic- 
ular the complexity analysis of formulas arithmetic (mulitiplication and exponentiation) reduces 
to the analysis of the formula addition operations. Furthermore it has been empirically observed 
that the lengths of expressions describing formulas in the SCF have smaller expected length than 
their FCF counterpart. We further remark that the SCF is implicit in the discussion of integer 
prime tower encodings [3J. 

4. Computing FCF integer encodings 

We describe here an asymptotically optimal algorithm for determining symbolic expressions 
which describe FCF integer formula-encoding for relatively large set of consecutive integers. The 
algorithm is based on the observation that given the FCF encoding of the first n positive integers 
one easily deduces from them the FCF encoding for the next 2™ — n positive integers. We pointed 
out earlier that the FCF encoding describes formulas corresponding to the Goodstein base 2 
recursive (or hereditary) integer encoding [3], however it is clear that an attempt to uncover 
the FCF encoding of integers by simply iterating through consecutive integers and recursively 
expressing in binary form the powers of 2, would yield a very inefficient algorithm. Incidentally our 
proposed algorithm for determining FCF integer formula-encodings amounts to a set recurrence. 
The initial sets for the recurrence is specified by 

No := {1} 

and the set recursion is defined by 



Se{{i} u x'tkj Ues J 

where for an arbitrary symbolic expression / and a set of symbolic expressions L, the set f L is to 
be interpreted as: 

y }{leL} ■ 

Three iterations of the set recurrence yield 

FCF 3 := 

{l, x, (x + 1), x x , (x x + 1), • •• , (x x + +x + x (x * +x+1) + x (x * +x) + x {x * +1) + x (x+1) + 1)} . 

It follows from the definition of the set recurrence, that the proposed algorithm requires 
O (log + (n)) iterations to produce FCF formulas for all positive integers less than n and the algo- 
rithm requires optimally O (n) symbolic expression manipulations. 

5. ZETA RECURSION. 

We recall here an elementary recursion called the Zeta recursion emphasizing its close re- 
semblance with the Zeta summation formula. Let us briefly recall here the Zeta recursion first 
introduced in [3] as a combinatorial construction for sifting primes. 

(6) Po := {x} , No := P U {1} 
we consider the set recurrence relation defined by 

(7) N fc+1 = [] {{1}U/*}, 

pePi 

where 

p N k ._ | p n guch that n e j^j ^ 

and for sets of symbolic expressions {Si} 0<i<m 



n s ; = n -* 

0<i<m I 0<k<m 



Si&Si 
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Finally N^+i is deduced from Nfc+i by adjunction of missing primes suggested by identification of 
gaps of size two between consecutive elements of f%+i, hence 



(8) 



\+i = F fe U (N fe+ i\N fe+ i) 



so that for all k > 0, we have P& ^ N^ and 
(9) 



N fe C N fe+1 



Furthermore we can use the Zeta recursion to iteratively construct larger and larger subsets of 
rational numbers deducing the set Qk from the previously obtained sets and Pfc as follows 



(10) 

where 
(11) 




u{i}u/ fc 



h § 



Which yields an alternative combinatorial proof of Cantor's result establishing that the rational 
numbers are countable. 

5.1. Improved Zeta recursion. Some slight modifications to the Zeta recursion has the benefit 
of improving the computational performance of the recurrence computation. 



(12) 

for an arbitrary q £ P& we have 



(13) 



u 

n e N fc 

q n < 2 fe+2 



{x}, No :=P U{1} 
/ 



[2 



fe+l ofe+2] 



n 



i n x n {{iiu/"} 

V G Pfe 

V p < q 



\ 1 



/ J 



from which we have that 



(14) 



N fe+1 = N fe U | (J N 9 , fc+1 



The completion of the set N^+i to Nfe+i is still determined by sorting the element in the set 
UqePfc an( l adjoining missing primes located by identifying gaps of size two between con- 

secutive elements of Nfe + i. 

The improved Zeta recursion is not a particularly efficient algorithm for the sole purpose of sieft- 
ing primes because it implicitly requires us to store rather large list of integers. The algorithm is 
however particularly well suited to the task of determining symbolic expressions describing SCF 
formula-encoding for a relatively large set of consecutive integers, with initial sets for the iteration 
being 

P :=M, No :=P U{1}. 
For instance seven iterations of the improved Zeta recursion yield 

SCF 7 = {l, x, (x + 1), x x , (x x + 1), (a: + 1) x, ((x + 1) x + 1) , • • • , (a; + 1) (x^ + l) (x x + 1)} 

It follows from the definition of the Zeta recursion that the proposed algorithm requires O (log n) 
iterations to determine SCF formulas for all positive integers less than n and the algorithm requires 

O (13^) symbolic expression manipulations. 
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